This code accompanies the paper: “Identification of a regeneration organizing cell in the Xenopus tail”, Aztekin, Hiscock et al. 2019.

Load data and preliminary visualizations

counts <- readMM("countsMatrix.mtx")
genes <- read.csv("genes.csv", header = F)$V1
cells<- read.csv("cells.csv", header = F)$V1
rownames(counts) <- genes
colnames(counts) <- cells
meta <- as.data.frame(read.csv("meta.csv"))
labels <- as.data.frame(read.csv("labels.csv"))
clusters <- read.table(file = "clusters.csv", sep = ",", header = T)

We then normalize the counts by library size, and plot UMIs (log10) across our entire dataset.

meta$umi <- Matrix::colSums(counts)
countn <- normalize(counts)
plotUMI(meta)

## NULL

We plot the cell type annotations, using precomputed projection and clustering.

plotAnnotation(meta,clusters)

Visualizing contributions of different conditions

Above, we plotted every cell from our dataset to create a reference projection. Now, we highlight cells from different biological conditions on this reference.

condition <- paste0(meta$DevelopmentalStage,"_", meta$DaysPostAmputation, "dpa")


for (sample in sort(unique(condition))){
  plotSample(which(condition == sample), title = sample, mode = "none")
}

We repeat this, but now retaining batch information.

for (sample in sort(unique(condition))){
  plotSample(which(condition == sample), title = sample, mode = "batch")
}

Gene expression visualization

We plot gene expression (log10 normalized counts).

plotGene(meta, countn,"lef1.S")

plotGene(meta, countn,"lef1.L")

plotGene(meta, countn,"tp63.S")

plotGene(meta, countn,"tp63.S")

plotGene(meta, countn,"krt.L")

plotGene(meta, countn,"hes1.L")

plotGene(meta, countn,"neurod4.L")

plotGene(meta, countn,"tubb3.L")

We also plot gene expression (log10 normalized counts) for different conditions.

for (sample in sort(unique(condition))){
  plotGeneSubset(meta,countn,"lef1.L", which(condition == sample),title = sample)
}

scGSEA analysis

Construct gene expression rankings, on the entire gene list,using AUCell. Note, genes with the same expression level are randomly shuffled; different seeds can be chosen to assess the impact of this shuffling on the results. Here we set the seed to an arbitrary value.

set.seed(42)
rankings <- AUCell_buildRankings(countn, nCores=2, plotStats=F)
## Quantiles for the number of genes detected by cell: 
## (Non-detected genes are shuffled at the end of the ranking. Keep it in mind when choosing the threshold for calculating the AUC).
##  min   1%   5%  10%  50% 100% 
##  376  674  973 1141 1989 8662
## Using 2 cores.

Read in predefined gene sets.

geneSets <- as.list(read.table("genesets.csv", header = T, sep = ",", na.strings = ""))
geneSets <- (lapply(geneSets, na.omit))
cbind(lengths(geneSets))
##                 [,1]
## FGF_ligands       29
## WNT_ligands       14
## BMP_ligands       12
## DELTA_ligands      7
## TGFB_ligands       4
## FGF_receptors      7
## WNT_receptors     28
## BMP_receptors      6
## DELTA_receptors    4
## TGFB_receptors     5

Construct scGSEA scores using AUCell.

scores <- getAUC(AUCell_calcAUC(geneSets, rankings))

Plot scGSEA scores for each cell type

for (geneset in rownames(scores)){
  scBox(scores[geneset,],meta$cluster, clusters, title = geneset)
}

Plot scGSEA scores for a subset of cell types

subset <- c(1,2,3,22:30,46,35:45)
m <- which(meta$cluster %in% clusters$names[subset])
scBox(scores["FGF_ligands",m],meta$cluster[m], clusters[subset,], title = "FGF Ligands")

scBox(scores["FGF_receptors",m],meta$cluster[m], clusters[subset,], title = "FGF Receptors")

Cell cycle analysis

Inferred cell cycle phase is computed using Seurat::CellCycleScoring, with the gene list contained in cellCycleOrthologs.txt [genes 1:42 are s.genes, and 43:91 are g2m.genes). Note there is a slight discrepancy to the cell cycle annotations provided in the array express download (< 0.5% cells); this is due to small differences in the numbers of genes considered.

cc.genes <- read.csv('cellCycleOrthologs.txt',header=FALSE,sep='\t')
cc.genes1<-sapply(tolower((cc.genes$V1)),function(x) paste0(x,".L"))
cc.genes2<-sapply(tolower(cc.genes$V1),function(x) paste0(x,".S"))
s.genes1 <- cc.genes1[1:42]
g2m.genes1 <- cc.genes1[43:91]
s.genes2 <- cc.genes2[1:42]
g2m.genes2 <- cc.genes2[43:91]
s.genes = c(s.genes1,s.genes2)
g2m.genes = c(g2m.genes1,g2m.genes2)

seuratHVG <- CreateSeuratObject(raw.data = countn)

seuratHVG <- CellCycleScoring(object = seuratHVG, s.genes = s.genes, g2m.genes = g2m.genes,
    set.ident = TRUE)

meta$CellCyclePhase <- seuratHVG@meta.data[,6]
table(meta$CellCyclePhase)
## 
##   G1  G2M    S 
## 9840 2657  702
condition <- paste0(meta$DevelopmentalStage,"_", meta$DaysPostAmputation, "dpa")
plotCellCycle(meta,clusters, "all data")

plotCellCycle(meta[which(condition == "st40_0dpa"),],clusters, "ST40_0")

plotCellCycle(meta[which(condition == "st46_0dpa"),],clusters, "ST46_0")

Compare fraction of cycling cells during regeneration. Regenerating tails (ST40, 2dpa) were compared to intact tails (ST46, 0dpa); these timepoints are chosen so that each sample is at the same developmental stage (5dpf). It is known, and observed in our data, that proliferation rates vary significantly over developmental time independent of amputation.

p1 <- getCycling(meta[which(condition == "st40_2dpa"),],clusters)
p2 <- getCycling(meta[which(condition == "st46_0dpa"),],clusters)
diff <- p1 - p2
par(mfrow=c(1, 1), mar=c(10, 5, 4, 8)); barplot(diff[subset],las=2,legend.text = FALSE,args.legend = list(x = "topright", bty = "n", inset=c(-0.15, 0)), cex.names = 0.65, col = sapply(clusters$cols[subset], as.character), ylim = c(0.0,0.5), ylab="change in fraction of proliferating cells")

Heatmap of marker genes

For selected sets of genes and/or cell types, we compute average gene expression (log10 normalized counts) for each cell type. Intensity values are normalized such that the cell type with maximal expression has unit value.

skin <- clusters[1:10,]
skinGenes <- as.vector(read.csv("skinGenes.csv", header = F)$V1)
M <- heatmap_xenify(countn,skinGenes,cells = which(meta$cluster %in% skin$names), condition = meta$cluster, clusters = clusters)

blood <- clusters[11:21,]
bloodGenes <- as.vector(read.csv("bloodGenes.csv", header = F)$V1)
M <- heatmap_xenify(countn,bloodGenes,cells = which(meta$cluster %in% blood$names), condition = meta$cluster, clusters = clusters)

muscle <- clusters[22:34,]
muscleGenes <- as.vector(read.csv("muscleGenes.csv", header = F)$V1)
M <- heatmap_xenify(countn,muscleGenes,cells = which(meta$cluster %in% muscle$names), condition = meta$cluster, clusters = clusters)

neural <- clusters[35:46,]
neuralGenes <- as.vector(read.csv("neuralGenes.csv", header = F)$V1)
M <- heatmap_xenify(countn,neuralGenes,cells = which(meta$cluster %in% neural$names), condition = meta$cluster, clusters = clusters)

M <- heatmap_xenify(countn,genes = c(skinGenes,bloodGenes,muscleGenes,neuralGenes), condition = meta$cluster, clusters = clusters,aspect.ratio = 2, xSize = 2.5, ySize = 2.5, xAngle = 90, yAngle = 0)

skin <- clusters[1:10,]
keratinGenes <- as.vector(read.csv("keratinGenes.csv", header = F)$V1)
M <- heatmap(countn,genes = keratinGenes,cells = which(meta$cluster %in% skin$names),condition = meta$cluster, conditionList = skin$names, norm = "max", aspect.ratio = 2.5, xSize = 7.5, ySize = 7.5)

rocGenes <- as.vector(read.csv("rocGenes.csv", header = F)$V1)
M <- heatmap_xenify(countn,rocGenes, condition = meta$cluster, clusters = clusters, xAngle = 90, aspect.ratio = 0.5)

We plot the average expression of krt.L (log10 normalized counts) across biological conditions, for selected cell types,

rocs <- which(meta$cluster == "ROCs")
goblet <- which(meta$cluster == "Goblet cell")
epidermis <- which(meta$cluster == "Epidermis")
krt <- log10(countn[which(rownames(countn) == "krt.L"),]+1)
condition <- paste0(meta$DevelopmentalStage,"_", meta$DaysPostAmputation, "dpa")

krtGOB <- aggregate(krt[goblet],list(condition[goblet]),mean)$x
krtROCs <- aggregate(krt[rocs],list(condition[rocs]),mean)$x
krtEPI <- aggregate(krt[epidermis],list(condition[epidermis]),mean)$x

dat <- cbind(krtGOB,krtROCs, krtEPI)
colnames(dat) <- clusters$names[1:3]
rownames(dat) <- sort(unique(condition))


melted <- melt((dat))


ggplot(data = melted, mapping = aes(x=Var1, y=Var2, fill=value)) +
    theme_bw() +
    geom_tile() +
    scale_fill_gradient(low = "white",high = "steelblue") +
    theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 12), axis.text.y = element_text(angle = 0, size = 10), aspect.ratio = 0.5, axis.title.x = element_blank(), axis.title.y = element_blank()) +
    theme(axis.title = element_blank(),  axis.ticks = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank())

Differential abundance statistics

First, we plot the cell type abundances (log10 fraction of all cells) for each biological replicate.

abundances <- t(table(meta$cluster, meta$sample))
fraction <- (abundances / rowSums(abundances))


log10fraction <- log10(t(fraction))

m1 <- match(clusters$names,rownames(log10fraction))
m2 <- c("SIGAA8","SIGAA5","SIGAG10","SIGAD12","SIGAA10","SIGAB8","SIGAE12","SIGAB5","SIGAC12","SIGAB10","SIGAC8","SIGAA11","SIGAF12", "SIGAC10")
m2 <- match(m2,colnames(log10fraction))

log10fraction <- log10fraction[m1,m2]
melted <- melt((log10fraction))

melted$value[is.infinite(melted$value)] <- min(melted$value[which(!is.infinite(melted$value))])

ggplot(data = melted, mapping = aes(x=Var1, y=Var2, fill=value)) +
    theme_bw() +
geom_tile(aes(fill = value), colour = "white") + scale_fill_gradient(low = "white",high = "steelblue")+
    theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 6), axis.text.y = element_text(angle = 0, size = 8), aspect.ratio = 0.5, axis.title.x = element_blank(), axis.title.y = element_blank()) +
    theme(axis.title = element_blank(),  axis.ticks = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank())

Next, we peform differential abundance analysis. We remove red blood cells which are observed in experiment to have variable capture efficiencies. We only consider cell types with a mean abundance per sample above 5, as suggested in https://bioconductor.org/packages/release/bioc/vignettes/cydar/inst/doc/cydar.html.

meta_remove_blood <- meta[-which(meta$cluster %in% clusters$names[c(18:21)]),]
da <- differentialAbundance(meta_remove_blood,labels,sample1 = "ST40_1",sample2 = "ST46_1", minN = 5, pthresh = 0.3, clusters = clusters)

Package information

sessionInfo()
## R version 3.5.1 (2018-07-02)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Sierra 10.12.6
## 
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] doRNG_1.7.1     rngtools_1.3.1  pkgmaker_0.27   registry_0.5   
##  [5] foreach_1.4.4   Seurat_2.3.4    cowplot_0.9.3   reshape2_1.4.3 
##  [9] ggrepel_0.8.0   edgeR_3.24.1    limma_3.38.3    AUCell_1.4.0   
## [13] igraph_1.2.2    Matrix_1.2-15   ggplot2_3.1.0   reticulate_1.10
## 
## loaded via a namespace (and not attached):
##   [1] snow_0.4-3                  backports_1.1.3            
##   [3] Hmisc_4.1-1                 plyr_1.8.4                 
##   [5] lazyeval_0.2.1              GSEABase_1.44.0            
##   [7] splines_3.5.1               BiocParallel_1.16.2        
##   [9] GenomeInfoDb_1.18.1         digest_0.6.18              
##  [11] htmltools_0.3.6             lars_1.2                   
##  [13] gdata_2.18.0                magrittr_1.5               
##  [15] checkmate_1.8.5             memoise_1.1.0              
##  [17] doParallel_1.0.14           cluster_2.0.7-1            
##  [19] mixtools_1.1.0              ROCR_1.0-7                 
##  [21] annotate_1.60.0             matrixStats_0.54.0         
##  [23] R.utils_2.7.0               colorspace_1.3-2           
##  [25] blob_1.1.1                  xfun_0.4                   
##  [27] dplyr_0.7.8                 crayon_1.3.4               
##  [29] RCurl_1.95-4.11             jsonlite_1.6               
##  [31] graph_1.60.0                bindr_0.1.1                
##  [33] zoo_1.8-4                   ape_5.2                    
##  [35] survival_2.43-3             iterators_1.0.10           
##  [37] glue_1.3.0                  gtable_0.2.0               
##  [39] zlibbioc_1.28.0             XVector_0.22.0             
##  [41] DelayedArray_0.8.0          kernlab_0.9-27             
##  [43] prabclus_2.2-6              BiocGenerics_0.28.0        
##  [45] DEoptimR_1.0-8              scales_1.0.0               
##  [47] mvtnorm_1.0-8               DBI_1.0.0                  
##  [49] bibtex_0.4.2                Rcpp_1.0.0                 
##  [51] dtw_1.20-1                  metap_1.0                  
##  [53] xtable_1.8-3                htmlTable_1.12             
##  [55] proxy_0.4-22                foreign_0.8-71             
##  [57] bit_1.1-14                  mclust_5.4.2               
##  [59] SDMTools_1.1-221            Formula_1.2-3              
##  [61] tsne_0.1-3                  stats4_3.5.1               
##  [63] httr_1.4.0                  htmlwidgets_1.3            
##  [65] gplots_3.0.1                RColorBrewer_1.1-2         
##  [67] fpc_2.1-11.1                acepack_1.4.1              
##  [69] modeltools_0.2-22           ica_1.0-2                  
##  [71] pkgconfig_2.0.2             XML_3.98-1.16              
##  [73] R.methodsS3_1.7.1           flexmix_2.3-14             
##  [75] nnet_7.3-12                 locfit_1.5-9.1             
##  [77] labeling_0.3                tidyselect_0.2.5           
##  [79] rlang_0.3.0.1               later_0.7.5                
##  [81] AnnotationDbi_1.44.0        munsell_0.5.0              
##  [83] tools_3.5.1                 RSQLite_2.1.1              
##  [85] ggridges_0.5.1              evaluate_0.12              
##  [87] stringr_1.3.1               yaml_2.2.0                 
##  [89] npsurv_0.4-0                knitr_1.21                 
##  [91] bit64_0.9-7                 fitdistrplus_1.0-11        
##  [93] robustbase_0.93-3           caTools_1.17.1.1           
##  [95] purrr_0.2.5                 RANN_2.6                   
##  [97] bindrcpp_0.2.2              nlme_3.1-137               
##  [99] pbapply_1.3-4               mime_0.6                   
## [101] R.oo_1.22.0                 hdf5r_1.0.1                
## [103] compiler_3.5.1              rstudioapi_0.8             
## [105] png_0.1-7                   lsei_1.2-0                 
## [107] statmod_1.4.30              tibble_1.4.2               
## [109] stringi_1.2.4               lattice_0.20-38            
## [111] trimcluster_0.1-2.1         pillar_1.3.1               
## [113] lmtest_0.9-36               Rdpack_0.10-1              
## [115] irlba_2.3.2                 data.table_1.11.8          
## [117] bitops_1.0-6                gbRd_0.4-11                
## [119] httpuv_1.4.5                GenomicRanges_1.34.0       
## [121] R6_2.3.0                    latticeExtra_0.6-28        
## [123] promises_1.0.1              KernSmooth_2.23-15         
## [125] gridExtra_2.3               IRanges_2.16.0             
## [127] codetools_0.2-15            MASS_7.3-51.1              
## [129] gtools_3.8.1                assertthat_0.2.0           
## [131] SummarizedExperiment_1.12.0 withr_2.1.2                
## [133] S4Vectors_0.20.1            GenomeInfoDbData_1.2.0     
## [135] diptest_0.75-7              parallel_3.5.1             
## [137] doSNOW_1.0.16               grid_3.5.1                 
## [139] rpart_4.1-13                tidyr_0.8.2                
## [141] class_7.3-14                rmarkdown_1.11             
## [143] segmented_0.5-3.0           Rtsne_0.15                 
## [145] Biobase_2.42.0              shiny_1.2.0                
## [147] base64enc_0.1-3